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An Analysis of Students’ Statistical Understandings 



In our past work we have developed an approach to instructional design that is 
generally consistent with the theory of Realistic Mathematics Education (RME) developed at the 
Freudenthal Institute in the Netherlands. It can be characterized as a “bottom up” approach in that 
the designer’s goal is to support students’ progressive mathematization of their initially informal, 
pragmatic problem-solving activities in experientially-real situations (Gravemeijer, 1994). The first 
phase of an instructional sequence involves students in an exploration of several problem situations 
set within a context that is real to them. During this initial phase it is important that students 
develop a genuine need to construct informal mathematical arguments. In the second phase, 
instructional activities are developed to support students ’ development of models of their initially 
informal mathematical activity. These models might involve the use of physical materials and 
computer-based tools and can result in the development of pictures, diagrams, charts, and non- 
standard and conventional notations. In the third phase of a sequence, students begin to generalize 
their informal models. The instructional activities at this phase are designed to make it possible for 
these models of informal activity to take on a life of their own and become models for increasingly 
abstract mathematical reasoning that remains rooted in situation-specific imagery. In the final phase 
of a sequence, the generalized models are considered from a more mathematical point of view. This 
approach to design involves conjecturing both possible learning trajectories for students and means 
of supporting students’ development along these provisional trajectories. The primary elaboration 
we have made to the RME approach in our work involves locating students’ conjectured 
mathematical activity in social context, thereby explicating the assumptions that the designer 
necessarily makes about the classroom microculture within which students act and interact. 

In our current work, we are refining our approach to instructional design in the context of 
students’ development of statistical thinking in seventh and eighth grade. As part of the pilot work 
for this project, we read the literature on statistics teaching and learning (see Appendix) in order for 
us to clarify what the “big ideas” should be in statistics at the middle-school level. There are 
actually only a handful of studies available that focus on students’ statistical understandings. These 
studies fall into two categories: (1) studies that examine students’ understanding of the mean and 
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(2) studies that examine students’ statistical understandings in the context of data analysis. All of 
the early research focuses on students’ misunderstandings and misconceptions of the mean (e.g. 
Mevarech, 1983; Pollatsek, Lima & Well, 1981, Strauss & Bichler, 1988). More recently, 
researchers have studied how students use the mean to summarize and compare data sets (e.g., 
Gal, I., Rothschild, K., & Wagner, D.A., 1990; Mokros and Russell, 1995) These studies 
emphasize that traditional instruction may provide students with the appropriate algorithm for the 
mean, but leave them with an incomplete conceptual understanding. An emerging research trend is 
focusing on studies where students are involved in the more complex activity of data analysis 
(e.g., de Lange, van Reeuwijk, Burrill, & Romberg, 1993; Hancock, Kaput, & Goldsmith, 1992; 
Jacobs & Lajoie, 1994; Konold, Pollatsek, Well, & Gagnon, in press; Lehrer & Romberg, 1996). 
Typically, these studies outline the process by which students analyzed and reasoned about data in 
more innovative instructional approaches. Clearly, the two categories of studies highlight different 
aspects of statistics instruction. The first set of studies emphasize the mathematical content (the 
mean) and the second set of studies document the mathematical process involved in data analysis. 
We believe it is crucial to transcend this dichotomy between content and process by developing an 
instructional approach that focuses simultaneously on data analysis and on mathematical content. 

In the context of data analysis, the mathematical content in statistics must move beyond 
simply understanding the mean to more unifying big ideas. For example, several authors stress the 
importance of students coming to view data as an entity as opposed to a collection of individual 
data points (Hancock et al., 1992; Konold et al., in press; Mokros & Russell, 1995). One idea that 
helps identify what might be involved in viewing data as entity is that of a space of potential data 
values. In particular, we conjecture that students who view data as entity see the individual data 
points as located within a space of possible values. As an example, Hancock et al. document that 
the students in their study rarely used the axis plot option of TableTop even though they had used 
the software to conduct data analyses for a year and could explain the meaning of the icons when 
shown on axis plots. They suggest that the very thing that made the axis plot powerful — the fact 
that it corresponded to a space of possible values, rather than to a single value - also made it harder 
to understand. In other words, since the students did not conceptualize the individual data points as 
located in the space of all possible values, the possibility of using an axis plot did not occur to 
them. 



A similar analysis holds in the case of Konold et al.’s observation that the students in their study 
rarely used the histogram option of the DataScope software. A histogram involves structuring the 
space of all possible data values into equal intervals. These examples illustrate the importance of a 
space of potential values as a big idea in statistics instruction. 

A second big idea, closely related to the first, that came to the fore in our reading of the 
literature is that of group propensity (Konold et al., in press). In defining group propensity, 
Konold et al. refer to the rate of occurrence of some data value within a group that varies across a 
range of data values. For example, the data value in question might be that of being a boy rather 
than a girl. Unless individual data points are located within a space of possible data values in which 
they can take on the values of boy or girl, the propensity of being a boy cannot be formalized as, 
say, 65%. As Konold et al. observe, the possibility of comparing groups in terms of means or 
relative frequencies did not occur to the majority of students in their study when they conducted 
data analysis. This can be accounted for in terms of their lack of understanding of the big ideas of a 
space of potential values and of group propensity. The development of these two big ideas together 
constitute major steps towards understanding the statistical concept of distribution and are, 
therefore, mathematically significant. We have also pointed out the crucial role these two big ideas 
play in data analysis. For us, these two big ideas bridge the dualism between content and process. 

In order to support students’ development of an understanding of data analysis along with 
an understanding of the big ideas in statistics, it is important to develop instructional sequences 
which (1) build on students’ current understandings and (2) support shifts in their current ways of 
reasoning. As part of our pilot work for our current project, we conducted classroom performance 
assessments in order to obtain baseline data on students’ current statistical understandings. The 
assessments were conducted during the fall semester of 1996 in three sessions of a seventh-grade 
class. During the sessions, a former middle-school teacher who was a member of the research team 
posed tasks to the students as they worked together in groups. The tasks were designed to provide 
information about students’ current understandings of (1) the mean and (2) graphical 
representations of data (inscriptions)because these two topics were the focus of the statistics 
chapter in the textbook series used by the students in their previous instruction. By focusing on 
students current ways of reasoning, the subsequent instructional materials could build from their 
current knowledge. The purpose of this paper, then, is to document the analysis of these 
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performance tasks. This analysis will then serve to inform subsequent decisions concerning 
instructional development. 



Results of Analysis 

The general format for the three mathematics class sessions in which the performance tasks 
were conducted was a whole-class introduction to the task, student collaboration in small groups, 
followed by a whole-class discussion of their solutions. Students worked in groups composed of 
from 3 to 6 students and the number of groups sometimes varied from task to task. In the 
following sections of this paper, we will begin by describing the context of the task, the design 
decisions underlying the task, and our anticipations of how students would respond to the task. 
Second, the small-group work is analyzed in order to highlight the various solution methods. 
Finally, the whole-class discussions are analyzed to clarify students’ understandings. 



Task 1: Spare Time 

In the first task shown in Figure 1, students were given a list of 24 activities collected from 
a survey in which their classmates were asked what they liked to do in their spare time. Students 
were asked to organize the list so that the principal could more easily present the results on a 
bulletin board for Parent’s Night. 



You have collected the following information from 


your classmates concerning what 


they like to do in their spare time. Organize the information so that the principal can 


more easily present the results on a bulletin board for Parent’s Night. 


read 


watch MTV 


watch re-runs of the X Files 


exercise 


listen to music 


play volleyball 


ride my bike 


read comic books 


watch old movies 


write letters 


watch TV 


clean up my room 


practice my guitar 


talk on the phone 


jog 


tradebaseball cards 


shoot basketball 


play with the dog 


play the piano 


go to the movies 


work on my stamp collection 


watch sports on TV 


hang out at the mall 


listen to the Grateful Dead 



Figure 1. Task Posed About How Students Spend Their Spare Time. 



This task was designed with categorical data so students could not use the mean. With the mean 
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eliminated, we wanted to see if students would then consider using inscriptions with this data set 
or if they would organize the data in some other way. We anticipated that students would not use 
inscriptions, but would form categories. We also thought that some topics might be problematic for 
students because we could imagine them being placed in more than one category. For example, 
watch MTV could be placed in a category entitled Watch TV or Listen to Music. As it turned out, 
this issue was not a problem for most of the students. 

All the groups of students began this task by organizing the data under various headings. 

We would say, then, that the interpretation of this task as one of creating headings to organize the 
data was taken-as-shared. As a result, the whole-class discussion focused on students’ different 
ways to form the headings. 

Group Work 

In analyzing the small group work, we have identified three different solution approaches 
developed by the eight groups of students: (a) creating categories, (b) creating clusters and (c) 
noting relative frequencies. Students who offered solutions that we call categories made two or 
three broad headings and then listed each topic from the task under one of these category headings. 
In our view, to form a category means making a distinction such that every item falls on one side 
or the other of that distinction. In other words, categories are mutually exclusive and partition the 
data. Three of eight groups of students formed categories. 

The groups who approached the task by forming categories began by examining the data 
and forming initial categories which essentially were conjectures of how they anticipated the data 
could be partitioned. By creating these initial categories the students explicitly formed a set of 
criteria for the category which allowed them to justify their placement of topics into each category. 

As they then began to place each topic within a category, they sometimes had to adjust the initial 
categories in order to place more data items within them. For example, one group began with initial 
categories of Physical Activities, Reading and Writing, and TV. As they worked through the list of 
topics, they identified three data items {play piano, play with the dog, and trade baseball cards) that 
did not unambiguously fit into any of their initial categories. To resolve this difficulty, they 
changed the second category to include hobbies which then allowed them to add these three data 
items. Making this adjustment to the second category explicitly changed the set of criteria for the 
category. This then allowed the group of students to justify the addition of the new data items. 



Students who made what we called clusters, created between 6 and 8 broader headings 
which essentially were a more succinct list of topics describing what the students did in their spare 
time. For us, forming a cluster means simply finding broad headings that define a somewhat 
smaller subset of the data set. This is different from creating a category in that forming a cluster 
does not involve making a distinction such that each item falls on one side or the other of that 

distinction. In other words, clusters are not mutually exclusive. Four of eight groups of students 
formed clusters. 

The groups who formed clusters began by inspecting the topics and identifying several that 
were related in some way. They would then cross these topics off the list and create one cluster 
heading that encompassed this group of similar topics. The students continued to make clusters 
(between 6 and 8) until all topics were crossed off the list. This approach further differs from 
forming categories in that these students did not form initial conjectures about how to partition the 
data. Instead, they partitioned the data set in action as they examined the individual topics. 
Consequently, they did not formulate explicit criteria that could serve to justify their placement of 
topics into one cluster rather than another. For example, one group that formed clusters completed 
the task and then noticed that the topic of practice the guitar was not crossed off their list. In the 
subsequent discussion, the clusters of Hobbies, Physical activity, and Music were all suggested as 
possible placements for the topic of practice the guitar. In forming their clusters, this group of 
students had not explicitly created a set of criteria that would allow them to justify the placement of 
topics within the cluster. Therefore, the specific placement of the topics was not an issue. It was 
only important that a cluster exist that could accommodate the topic. Depending on the individual 
student's interpretations, practice the guitar could be placed within at least three different clusters. 

The final solution approach we call relative frequencies. The one group of students who 
formed relative frequencies created categories, but instead of listing the topics from the task under 
each category, they recorded the fraction of the topics that fell into each category. In other words, 
they mathematized the partitionings they had created. Although only one of eight groups solved the 
problem in this way, we feel the difference in approach is highly significant as this group of 
students looked at the problem quantitatively. For example, beside the category heading Watching 
TV, they wrote 4/24, indicating that 4 of the 24 topics fell into this category. This indicates that 
these students interpreted the categories they formed as, at least, additive frequencies. 

O 
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It is important here to clarify the distinction between additive and multiplicative frequencies. 
To interpret 4/24 additively means that these students partitioned the whole (24) and one of those 
partitions contained four items. These four items are viewed as a subset of the whole and the 
relationship between the part and the whole is fixed. To interpret 4/24 multiplicatively, they would 
have to think about the relationship between the part and the whole as some sort of change or 
variation is introduced into the task. For instance, if more topics were added to the problem, we 
could ask the students to predict how many of the new topics would fall into a certain category. 
Another possibility would be if we asked the students to compare a category containing 4/24 topics 
with a category from another survey containing 7/45 topics. In these instances the students would 
have to think about the relationship between the part and the whole as they vary in the first example 
and to compare data sets with an unequal number of items in the second example. The relationship 
between the part and the whole is no longer static and more than part-whole reasoning would be 
required - proportional reasoning would be necessary. Reasoning in this way is central to what 
Konold, Pollatsek, Well, and Gagnon (in press) refer to as a statistical perspective — attending to 
features of the data as an entity (the aggregate) as opposed to features of individual data points. The 
limited nature of this task does not allow us to differentiate between additive and proportional 

reasoning, which is why we can only claim that the students in this group established additive 
relative frequencies. 

Whole-class Discussion 

As the groups that formed clusters shared their solutions in whole-class discussion, they 
questioned one another about the usefulness of their clusters framed against their desire to reduce 
the number of clusters. The first group that explained their reasoning had formed 1 1 clusters and 
other students argued that two or more of those clusters could be combined under a broader 
cluster. For example, the group that formed 11 clusters had Listen to Music and Playing Musical 
Instruments as two different clusters. A student suggested that these two clusters could be 
combined into one cluster entitled Music. The discussion continued with several other suggestions 
of combining clusters. As groups that formed categories shared their solutions, this same type of 
discussion did not occur since they only had two or three categories. 

In the on-going discussion, the students did not question the placement of individual topics 
within particular clusters even when prompted by the teacher. For example, the teacher explicitly 
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raised this issue by asking the first group that shared their solution where they placed watch MTV. 
From our perspective, watch MTV could have been placed under their cluster of Watch TV or 
Listen to Music. The group stated that they had placed it under Watch TV and there was no further 
discussion. It therefore appears that, from the students’ perspective, there was no need to justify 
how they had organized the data - they did not feel it was necessary to develop publicly accepted 
criteria for their decisions. The focus of the whole-class discussion was on the procedure of 

forming categories and clusters rather than the reasoning underlying the placement of topics within 
particular categories and clusters. 

Although we mentioned three different solution approaches to this task, the teacher, in 
action, judged that there were only two. Only in retrospect did we recognize the significance of the 
relative frequency approach. Because of this, only the two approaches that we call categories and 
clusters were addressed in the whole-class discussion. As a result, the teacher summarized the two 
types of solutions that were highlighted and then attempted to focus the discussion on the 
differences between these two approaches. She pointed out that some groups had a list for the 
Principal of between 6 and 8 headings (clusters) without the topics included. The other groups, she 
noted, had a list of only 2 or 3 headings (categories) but they had included each topic underneath. 
Similar to the previous discussion on the placement of individual topics, the students did not 
appear to see any significance in the difference between the approaches and there was no further 
discussion. Since the students had rarely made any reference to either the task setting or how their 
data organizations would be used, it could be argued that they perceived the task as merely a 
procedure which lacked any mathematical rationale. They simply inteipreted this as a task they 
were to complete by grouping topics. If this was the case, then these students would have no 

means of evaluating the two approaches other than their personal preferences and there would be 
no need to discuss the issue. 



Task 2: How Much TV? 

In the second task which is shown in Figure 2, students were given results from a survey 
of 30 students in which they were asked how much television they watch in one week. The 
students were asked to summarize and present the data in some form so that when posted on the 
bulletin board parents would be able to quickly understand the results. 
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Below are the results of a survey taken of 30 seventh graders to find out how many 
hours of television they watch in a week. The Principal has asked you to summarize 
and present this data in some form so that parents will be able to understand it quickly 
when it is posted on the bulletin board. The Principal also asks you to write a short 
report for parents explaining what the data shows. 
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Figure 2. Task Posed About How Much TV Students Watch. 



In designing this task, we were interested to see if students would summarize the data set by using 
an inscription or the mean. The data set was purposely designed with a large range (0 - 23) which, 
for us, would make the mean an inappropriate summary for this task situation. We anticipated that 
students would use both approaches to summarizing the data despite the large range. All the groups 
but one offered a solution that included an inscription. 

Group Work 

Ultimately, only one of the groups solved this task by finding the mean of the data set. For 
this group, the task appeared to be to summarize a large set of numbers. These students’ prior 
experiences in school mathematics could have influenced their interpretation of the task. Typically, 
in traditional school mathematics, a large set of numbers is summarized by calculating the mean. 

The remaining groups made some sort of graph to describe how much television was watched 
although some of these groups discussed the appropriateness of using the mean. 

One group in particular had a very intense discussion about whether or not the mean was an 
appropriate way to represent the data set. One student in the group, Trent, wanted to use the mean 
because the task for him was to tell about the 30 students “altogether” and that, he argued, is 
precisely what the mean did. Others in the group argued that since some students only watched 1.5 
hours of TV, then the average of 10.56 is “way off’ and “It’s so off, you cannot use the answer.” 
The group’s discussion seemed to focus on whether or not you could actually use the mean and not 
on whether or not the mean was an appropriate representation of this data set. After several minutes 
of discussion, it was not clear to us whether the students felt that the mean was inappropriate 
because of the high variability in the data set or if they felt that since some of the data points were 
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so far from the mean that it did not work for this data set. Clearly, Trent and the members of his 
group had different notions of the mean. Since the group could not come to an agreement, Trent 
decided to use the mean and the others in the group decided to make a graph which is shown in 
Figure 3. 

In making their graph, the rest of Trent’s group decided to organize the data into categories 
from 0 to 10, 10 to 20, and 20 to 30. One student in the group then pointed out that the “highest on 
here is 23. As a result, the group changed the upper bound in the last category to 25. We find this 
interesting since the highest data point is 23.5 and changing the upper bound of the last category 
from 30 to 25 did not change the height of the bar. It did, however, result in unequal intervals. As 
the group began to draw their graph, they changed the category labels to 0 to 10, 1 1 to 20, and 21 
to 25. Even though the graph is reminiscent of a histogram, we believe that for this group it was a 
bar graph over intervals. Changing the upper bound from 30 to 25 indicates that this group of 
students did not view this data as positioned within a space of potential data values. The way the 
group created the categories as described above indicates that they were partitioning the data items 
rather than creating intervals that fell along a continuum of potential data values. The spaces 
between the bars also indicates that each category was a separate subset of the data rather than 
intervals in the space of possible data values. 
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Figure 3, Group One’s Graph for How Much TV? 



The second group’s graph is shown in Figure 4. These students began by grouping the 
data into categories from 0 to 5, 6 to 10, 11 to 15, 16 to 20, and 21 to 25. However, when they 



drew their graph they did not make bars but stmply put a line at the upper lim.t of each category. 
Similarly, they did not list the categories (0 to 5, 6 to 10) when they labeled the axis on their final 
drawing, but instead wrote only the upper limit of the category. In addition, they used a large dot 
to mark the endpoint of the line. It is possible that this group viewed the data as distributed along a 
continuum, but we cannot verify this with the available information. If this were the case, we 
would classify their graph as a histogram even though it does not conform to the conventions for 

drawing histograms. The most we can claim with the available information, is that this group 
categorized data items. 
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Figure 4. Group Two’s Graph for How Much TV? 



The third group's graph is shown in Figure 5. This group began by ordering the data from 
the least number of hours of television watched to the greatest. They then decided to group the 
ranked data into categories. They chose to make five categories with six data points in each 
category. On the bar they wrote the range of the data points signified by each bar. This group 
created a bar graph in which each category, indicated by a bar, signified six dala items. The 
numbers wntten on the bars to record the values of the data items have gaps between them, 

indicating that this group of students also did not view this data as distributed on a continuum 
within a space of possible values. 
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Figure 5, Group Three’s Graph for How Much TV? 

The fourth group’s graph is shown in Figure 6. This group began by rounding all the 
numbers to the nearest whole number and then ordering the data items. Their graph contained 12 
bars that indicated the hours of TV watched. This group of students seemed to view the data as 
individual items that could be ordered. We will say more about this group in the next section. 




Hours of TV Watching 

Figure 6. Group Four’s Graph for How Much TV? 



Basically two approaches were used by the groups of students when creating their graphs. 
The first approach involved ordering individual data points. The students in Group Four 
exemplified this approach when they created their graph (see Figure 6). Creating categories for 
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ordered data points is the second approach. Groups One, Two, and Three expressed this approach 
in their graphs (see Figures 3, 4, 5). In both of these approaches, the data set was viewed as a 
plurality of individual data points that could be rank ordered and/or categorized. It appears that 
none of the groups viewed the data set as a global whole that could be distributed along a 

continuum. Had they done so, it is reasonable to assume they would have drawn a histogram. In 
contrast to a bar graph over intervals, a histogram involves structuring the space of all possible data 
values into equal intervals. It is important to point out that simply drawing a histogram would not 
necessarily mean that students viewed the data set as an entity that could be distributed within a 
space of possible data values. We would also have to take into consideration how the students 
reasoned about the data and the resulting graph. 

Whole-class Discussion 

During the whole-class discussion, the four graphs previously described were presented on 
the board and then became topics of discussion. After the first group described their graph (see 
Figure 3), the teacher asked the rest of the class if they had questions or comments about this 
graph. There was no discussion. Next, the second group described their graph (see Figure 4) and 
the teacher then initiated a discussion by asking students to clarify the similarities and differences 
between the two graphs. 



Student 1: 

Teacher: 
Student 2: 
Teacher: 
Student 2: 

Student 3: 
Teacher: 

Student 4: 
Student 5: 



That’s (points to Group Two’s graph) a line graph and that’s (points to Group 
One’s graph) a bar graph. 

That’s a line graph and this is a bar graph. Anything else? 

I thought it was supposed to be bars and not little lines... like bars. 

Why do you think it is bars? 

Because if you do a line it is supposed to go up at the time when it goes up and it 
goes down when... it goes up when the rate is high and goes low... 

I think it is supposed to have bars when it goes vertical [sic] like that. 

What if she is calling these skinny bars? What if she is saying these are really just 
skinny bars? 

With dots on the ends of them? 

With dots on them? I mean, you could do that but you wouldn’t have a line on 
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them, you would just have the dots. 

Student 6: A line graph is supposed to be connected to other line segments. 

It is significant that the focus of the discussion was on the form of the graphs and not on 
what they signified. These students were simply concentrating on what they remembered about the 
conventions for drawing graphs. This same type of superficial discussion occurred after the third 
group presented their graph (see Figure 5). Students were again asked to clarify similarities and 
differences of the graphs. In this discussion, the students’ contributions focused only on the 
direction of the bars. If we consider students’ prior school math experiences, this does not seem 
too unusual. In traditional mathematics instruction, a discussion about graphs would highlight the 
conventions for drawing graphs. This could explain the students’ emphasis on surface features like 
lines, bars, dots, positioning of bars, and connecting dots rather than on what those lines, bars, or 
dots signified and how that related to the task situation. These discussions were similar to the 
whole-class discussion for the Spare Time task where the students were interested in the 

procedures of forming clusters and categories rather than the reasoning underlying the placement of 
topics. 

As the fourth group was presenting their graph (see Figure 6) they decided to modify it by 
rounding all the data items to either 10 or 20, resulting in only two bars signifying the hours of TV 
watched. They did not finish drawing their new graph but described it for the other students. In 
their initial graph (shown in Figure 4) they ordered but did not categorize the individual data items. 
In their second graph, which they described as containing only two bars (one at 10 and one at 20), 
they categorized the ordered data items. It may be that as this group listened to the other three 
groups discuss how they made their graphs, they noticed that all three groups had categorized their 
data. It is possible that categorizing the data was becoming taken-as-shared in this classroom which 
prompted this fourth group to change their graph to fit with this emerging interpretation. 

Tasks 3 & 4: Basketball All-Star & Trip Decision 
In the Basketball All-Star and Trip Decision tasks, students were asked to make a decision 
based on given sets of data. We discuss these tasks together since they were formatted in a similar 
manner. We designed these tasks in an attempt to gain information on how the students would deal 



w,th the issue of variability as it related to the mean. We antic, pated that some groups would reason 
that the data set with the larger mean was the better choice without going back to the task situation 
and considering the impact of variability on the decision in these particular instances. We therefore 
designed the tasks so that the data set with the larger mean also had the greater variability. For us, 
then, the data set wi th the larger mean would not be the better choice because in these two 

situations (scoring in basketball and temperature ranges) consistency would be more important 
when making a decision. 

Basketball All-Star 

In the Basketball All-Star task shown in Figure 7, students were given a listing of the 

number of points scored by each of two basketball players in each of eight games. They were then 

asked to decide which player should be selected to play in the all-star tournament based on these 
scores. 



One player will be selected from the Meigs basketball team to play in the all-star 
tournament. Below is a listing of the points scored by the top two candidates for 
the last eight games of the season. Based on this information, present an 
argument to support the selection of one of the players. 

Player A: 11 31 16 28 27 14 26 15 

Player B: 21 17 22 19 18 21 22 20 



^ ure 7 - Task Posed About The Basketball All-Star Tournament. 

— ° Up WOrk - This ^ k was a PProached in two distinct ways. The majority of students, 
five of eight groups, solved this task by calculating the total or the mean. The remaining three 
groups also initially found the mean or the total, but then they reconsidered the task situation and 
decided the player with the higher mean was not the better player for the tournament. For the five 
groups who calculated the total or the mean, this task was about totaling or averaging the points of 
each player and determining the winner by comparing the outcome. Each of these groups selected 
Player A because he had the higher total and/or the higher average number of points. For these 
groups, the mean provided the best summary of the data irregardless of the situation. If we 
consider the prior school experiences of these students, reasoning about the problem in this way 



seems reasonable. In traditional school mathematics a group of numbers is often summarized by 
calculating the mean. 

The other three groups of students initially began the task in the same way as the groups 
described above. However, they subsequently selected Player B even though his total points were 
lower than Player A. After calculating the total or the mean, these groups went back to the task 
situation and decided that the player with the higher mean was not necessarily the player that 
should be sent to the tournament. These discussions will be elaborated in the next section. 

Whole-class discussion. The teacher began the whole-class discussion by asking students 
to defend their choice of Player A or Player B as the one to send to the all-star tournament. The 
first group to share their thinking argued for sending Player B. 

Student: Our group said that you should send Player B to the tournament because he has 

a.. .even though Player A ’s average [sic] is higher than his, which was only by 
eight points, he has a more steady.. .in his games.. .his points.. .they’re 
more... they ’re not so up and down like Player A ’s are... where one day... he goes 
from 11 points to 31 points so that’s why we said Player B. 



Only one other group shared their thinking and they argued for sending Player A to the 
tournament. However, their argument went beyond their reasoning about the mean. This group 
had been challenged by another group that was sitting at the same table and as a result, expanded 
their argument. They created a situation to explain the variability in Player A ’s scores and argued 
that Player A, in addition to having the higher average, was probably a “team player.” His low 
scores (11, 16, 14, 15) indicated that he was giving the ball to other players and he probably 
earned high assists in those games rather than high points. His high scores (31, 28, 27, 26) 
indicated that he helped the team out when they were falling behind. This group continued their 
support of Player A by creating a situation to downplay Player B ’s consistent scoring. They argued 
that Player B had scores that were about the same which indicated that he had a “personal goal” to 
achieve in each game and was not thinking of the team. This group believed the mean was the best 
summary of the data sets and the counterargument about consistency in scoring did not alter their 
position. In order to defend their choice of selecting the player with the higher mean, this group 
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appeared to rely on their knowledge of basketball to create a narrative that would support their 
choice of Player A. 

Trip Decision 



In the Tnp Decision task shown in Figure 8, students were given data on the temperatures 
in Boston for the third week of September and Apnl for the past three years. The students were 

asked to use this data to decide if next year’s school trip to Boston should be taken in September or 
April. 



The students are planning to go to Boston during the school year in 1997-98. The Principal has 

mAnri^Y T T* ° fSCh °° l eMer the third week in September or the third week 
in Apnl. You have been asked to research the weather during these times and make a 
recommendation about when to go to Boston. 

Below are the temperatures for each of these weeks for the past three years. Based on the 



your choice. 














September: 














1996 


77 


71 


80 


75 


73 


79 


77 


1995 


75 


71 


77 


80 


76 


70 


69 


1994 


79 


71 


74 


79 


77 


73 


72 


April: 
















1996 


84 


80 


76 


76 


68 


62 


58 


1995 


79 


83 


87 


90 


89 


86 


81 


1994 


56 


57 


68 


73 


79 


80 


84 



average for week: 76 
average for week: 74 
average for week: 75 



average for week: 72 
average for week: 85 
average for week: 71 



Fiflure 8. Task Posed About The School Trip. 



G roup work. This task was approached in two distinct ways. Two of seven groups of 
students solved this task by calculating the mean. The remaining groups did not perform any 
calculations in determining their solution. The two groups that calculated the mean found the 
average of the three weekly averages given in the task for September and April which were 75 and 

76 respectively. They selected April because it had the higher average and for them, the average 
was the best summary of the two data sets. 

The remaining five groups did not perform any calculations as they reasoned about this 
task. Instead they focused on the daily temperatures presented in the task and formed arguments 
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based on the consistency of the temperatures. For example, these groups suggested that September 
would be the better month for the class trip because the weather was more predictable. Their 
arguments will be discussed in more detail in the next section. 

Whole-class discussion. The teacher began the whole-class discussion by asking groups to 
offer arguments for taking the trip in either September or April. 



Student 1: 



Teacher: 
Student 2: 



Teacher: 



Our group chose to go in September because it’s not.. .there’s not like really big 
drastic temperature changes; where as in April it goes from... I know in ‘94 it went 
from like 56 all the way to 80 degrees in one week. So that’s why we chose 
September, it has pretty much the same temperature. 

OK, it’s kind of stable (writes on board). Alright, someone else? 

We say September because the weather is steady... like and the average weather in 
April for these three... like it went from in ‘96 it was 72, ‘95 - 85, ‘94 - 71. So you 
can really predict what kind of weather it’s gonna be [in September] it was 75, 74, 
76 so you know it’s going to be around 70-something and... it’s just too cold [in 
April], ..56 and 57 degrees. ..you want a warm weather.. .you don’t want to be 
wearing no coat and stuff. 

Y ou don’t want to have to pack all that stuff do you? OK... What about the fact that 
this one (points to April on the board) has a higher average. Did you calculate the 
overall average? It has a higher overall average, so shouldn’t it be warmer then? 



Even though two groups did approach this task by calculating an average this way of 
reasoning about the task did not emerge from students’ arguments. Although the teacher brought 
this issue up in the whole-class discussion, the subsequent interchange was minimal. In the 
ensuing discussion, one student suggested that a one degree difference was not that significant and 
another student suggested that the unusual 85 degree average “kicked the average up” for April. 
This lack of interest in the mean is interesting when contrasted to the Basketball All-Star task where 
the majority of groups calculated the mean and the mean was explicitly included in the arguments 
during the whole-class discussion. This could be accounted for by the fact that this task seemed to 
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encourage qualitative judgments rather than calculations. For example, in the Basketball All-Star 
task, some students found the total number of points to make a decision. In this task, adding the 
temperatures to find a total did not make sense to students. In their out-of-school practices, 
students do not add daily temperatures to make a decision. Instead, students made qualitative 
judgments based on the inspection of individual data points. Another possible explanation for the 
lack of interest in the mean could be related to the fact that this was the last task presented to these 
students. In the earlier tasks we saw the impact of prior school mathematics on students’ 
reasoning. For example, we pointed out that students seemed to view the first task as a typical 
school mathematics task involving a procedure of categorizing data points. Similarly, the second 
task appeared to be interpreted as a task about the procedure of making a graph. In the third task, 
the majority of students reasoned that the mean was the best summary of the data set irregardless of 
the situation. It is possible that after interacting with the teacher and their peers for three days on 
these non-school-like tasks, that students began to reinterpret the social situation of the 

mathematics classroom and began to construct new beliefs about what was expected of them in this 
situation. 



Conclusion: Implications for Instructional Design 
As stated prior, our purpose in this paper was to document the analysis of classroom 
performance assessments that we conducted as part of the pilot work for our current research 
project. We were interested in gathering information about students’ current statistical 
understandings in order to develop instructional materials. The tasks were designed to provide us 
with information about students’ current understandings of (1) mean and (2) graphical 
representations. 

In our analysis we found that students typically viewed the mean as a procedure that was to 
be used to summarize a group of numbers regardless of the task situation. Data analysis for these 
students meant “doing something with the numbers” which was grounded in their prior school 
mathematics experiences. Similarly, we found that students’ conversations about graphical 
representations highlighted the procedures for constructing graphs with no attention to what the 
graph signified and how that related to the task situation. These findings will have important 
implications for us as we design instructional sequences for our current research project. 
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It will be crucial for us to help students develop a sense of data analysis as being more than 
“doing something with the numbers.” We can begin to do this by creating task situations that are 
relevant to middle school students. Of course, tasks that are interesting to students are not enough. 
We found in our analysis that interesting, non-school task situations were also proceduralized by 
the students. We will need to spend time building a strong context in order to pull students away 
from their procedural orientation. In establishing this context, it will be important to immerse 
students in the situation by discussing the problem or issue being investigated. These discussions 
can help students clarify the significance of the problem, identify aspects of the problem that could 
be measured, and consider ways those measurements could be made. After this extensive 
orientation to the task, we hope that students will be more grounded in the situation and we can 
then support a shift in their reasoning towards data analysis as inquiry rather than procedure. 
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